ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group94c.txt / 000006_icon-group-sender _Thu Dec 8 20:58:20 1994.msg < prev next >

Wrap

Internet Message Format | 1995-02-09 | 8KB

Received: by cheltenham.cs.arizona.edu; Thu, 8 Dec 1994 14:01:17 MST To: icon-group-l@cs.arizona.edu Date: 8 Dec 1994 20:58:20 +0100 From: antimiro@loria.fr (Valentin Antimirov) Message-Id: <3c7ogs$i9t@myrtille.loria.fr> Organization: CRIN & INRIA-Lorraine - Nancy - FRANCE Sender: icon-group-request@cs.arizona.edu References: <1994Dec7.221649.10939@cs.sfu.ca>, <3c62kb$pnu@caslon.CS.Arizona.EDU> Subject: Re: [Q] Algorithm for Regexp Subsumption Errors-To: icon-group-errors@cs.arizona.edu In article <3c62kb$pnu@caslon.CS.Arizona.EDU>, dave@CS.Arizona.EDU (Dave Schaumann) writes: |> |> In article <1994Dec7.221649.10939@cs.sfu.ca>, |> Martin Vorbeck <vorbeck@cs.sfu.ca> wrote: |> |> } are there any algorithms out there for checking whether a regular |> }expression subsumes another one, ie L(E1) is a subset of L(E2)? I have a |> }"brute-force" solution along these lines: |> } |> }1. Compute equivalent finite automatas A1 (resp A2) for E1 (resp E2). |> }2. Compute A3 = A1 intersected with the complement of A2. |> }3. Test L(A3) = empty ==> E2 subsumes E1. |> } |> }But I wonder if this couldn't be done directly on the regular |> }expressions E1 and E2? |> |> Another possibility would be to compute the minimized DFAs, then |> compute the minimized DFA of the union. If it's the same as either |> of the originals, you've got a match. |> |> Hard to say which method would be quicker. Since they're essentially |> duals of each other, I wouldn't be surprised if it was a wash. |> |> -Dave Yes, it is possible to prove any regular inequality E1 =< E2 (as well as any regular equation E1 = E2) in a pure algebraic way -- without constructing DFA's for the expressions involved. 1. As far as regular equations are concerned, such a way has been described in [1]. It is an algebraic calculus presented in the form of a term-rewriting system. I have written a program in OBJ3 (an algebraic programming language) implementing this calculus. If someone is interested in getting the program (together with quite complete instructions how to run it, examples, etc.), send me please a request by e-mail. 2. Since E1 =< E2 is equivalent to E1 + E2 = E2, the abovementioned calculus can also be used for proving/disproving regular inequalities. However, there exists a more direct (and more efficient) algebraic procedure based on so-called "partial derivatives". Below I outline it. Consider the set Reg of regular expressions on a given alphabet A: Reg ::= 0 | 1 | x | Reg . Reg | Reg + Reg | Reg * where 0 denotes the empty language, 1 is a unit language, x is a letter from A, and the three other operations -- concatenation, regular union, and Kleene star -- have standard meaning. Let Reg0, Reg1 be subsets of Reg defined by the following grammars: Reg0 ::= 0 | x | Reg0 . Reg | Reg . Reg0 | Reg0 + Reg0 Reg1 ::= 1 | Reg1 . Reg1 | Reg1 + Reg | Reg + Reg1 | Reg * One can check that Reg is a disjoint union of Reg0 and Reg1 and that for any a0 \in Reg0 and a1 \in Reg1 the inequality 1 =< a0 is false and 1 =< a1 is true.) Let Set[Reg] denote the set of all finite sets of reg. expressions. DEFINITION ( [3] ) Given x \in A, the function d_x : Reg -> Set[Reg], computing the set of partial derivatives of its argument w.r.t. x, is defined recursively by the following equations: d_x(0) = d_x(1) = {} (* an empty set *), d_x(y) = if y == x then { 1 } else {} fi, d_x(a+b) = d_x(a) \cup d_x(b), d_x(a0.b) = d_x(a0).b, d_x(a1.b) = d_x(a1).b \cup d_x(b), d_x(a*) = d_x(a).a* for all y\in A, a,b \in Reg, a0 \in Reg0, a1 \in Reg1. Here the extension of concatenation _._ to Set[Reg] is defined in the following way: R . 0 = {}, {} . t = {}, {0} . t = {}, {1} . t = {t}, {a} . t = {a.t} if a =/= 0, a =/= 1, (R \cup R').t = (R.t)\cup (R'.t) for any R, R' \in Set[Reg], t \in Reg, t =/= 0, a\in Reg. -------------------- Given a set R \in Set[Reg], let \sum R be either 0 if R={}, or a sum (regular union) a + b + ... of non-zero elements a, b, ... of R. Note that here the regular union _+_ is considered to be associative, commutative and idempotent (while in the above definition of d_x() all regular expressions are treated as free terms). DEFINITION ( [2] ) Given a, b \in Reg, x\in A, and p \in d_x(a), the inequality p =< \sum d_x(b) is called a "partial derivative of a regular inequality a =< b w.r.t. the letter x". Let pd_x(a =< b) be the set of all partial derivatives of a =< b w.r.t. x. Given a set S of inequalities, let pd_x(S)=\bigcup_{s\in S}pd_x(s). ------------------ Iterating this definition, one obtains partial derivatives of a =< b w.r.t. non-empty words on A: pd_{wx}(a =< b) = pd_x(pd_w(a =< b)). for all w\in A^+, x \in A. THEOREM ( [2] ) 1. The set PD(a=<b) = \bigcup_{w\in A^+} pd_w(a =< b) of all possible partial derivatives of a =< b w.r.t. all non-empty words on A is finite and its cardinality is less than or equal to \let(a)*2^\let(b) where \let(r) is the number of occurrences of alphabet letters appearing in the reg. expression r. 2. If a =< b is a valid inequality (i.e., the regular language denoted by a is a subset of one denoted by b), then all its partial derivatives are also valid. If a =< b is not valid, then the set PD(a =< b) contains at least one "trivially inconsistent" inequality of the form p1 =< q0 where p1\in Reg1, q0\in Reg0. ------------------ This gives the following procedure to decide whether a regular inequality a=<b is valid: 1. Check if a=<b is "trivially inconsistent", i.e. if a\in Reg1, b\in Reg0 -- then a=<b = false. 2. Otherwise, compute the set pd_x(a =< b) for all the letters x occuring in a. If one of the new inequalities is "trivially inconsistent", then a=<b is disproved. Otherwise, keep computing partial derivatives of new inequalities until either you get a "trivially inconsistent" inequality, or the set of inequalities becomes saturated (no new partial derivative can be obtained). In the latter case, a=<b is valid -- as well as all the obtained partial derivatives of a=<b. (See [3] for a natural way to compute partial derivatives through so-called non-deterministic linear forms). This is indeed a decision procedure, but its efficiency can be drastically improved using simple "logical" observations. First, it's quite clear that it makes no sense to differentiate inequalities like a =< a, a =< a + b, 0 =< a, 1 =< a1 etc. which are "trivial tautologies". These should just be removed from the set of inequalities to be checked. Second, if two inequalities of the form a =< b and a =< b + c occurs in the set, then the second one can also be removed, since it is a logical consequence of the first one. It is proved in [2] that with these optimizations the procedure in some cases is *EXPONENTIALLY* faster that any one constructing DFA for the expressions involved -- you can check it yourself on the following example: (a* b)* (a a ... a) a * =< (a + b)* a (a+b)...(a+b) ^-------^ ^------------^ n times n-1 times (it is known that the minimal DFA for the right-hand side has 2^n states, but our procedure -- with the optimizations described above -- generates only n+2 partial derivatives to be checked). REFERENCES [1] V.~M. Antimirov and P.~D. Mosses. Rewriting extended regular expressions. {\em Theoretical Comput. Sci.}, 141, 1995. (To appear. Available on WWW-page http://www.daimi.aau.dk/~pdm/rere.ps.Z). [2] V.~M. Antimirov. Partial derivatives of regular expressions and finite automata constructions. (To be presented at STACS'95. Available as tech. report CRIN-94-245). [3] V.~M. Antimirov. Rewriting regular inequalities, October 1994. (Submitted) _______________________________________________________________________________ Valentin ANTIMIROV | Tel.: (33) 83 59 30 15 (direct) CRIN (CNRS) & INRIA-Lorraine, | (33) 83 59 30 00 (INRIA) 615, rue du Jardin Botanique, B.P. 101 | Fax: (33) 83 27 83 19 F54602 VILLERS-LES-NANCY Cedex, | Telex: 850 238 F FRANCE | e-mail: Valentin.Antimirov@loria.fr -------------------------------------------------------------------------------